skip to main content


Search for: All records

Creators/Authors contains: "Chen, Ke"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background

    Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existingk-mer-based bucketing methods have been efficient in processing sequencing data with low error rates, but encounter much reduced sensitivity on data with high error rates. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps.

    Results

    In this paper, we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be$$(d_1, d_2)$$(d1,d2)-sensitive if any two sequences within an edit distance of$$d_1$$d1are mapped into at least one shared bucket, and any two sequences with distance at least$$d_2$$d2are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of$$(d_1,d_2)$$(d1,d2)and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal.

    Conclusion

    These results lay the theoretical foundations for their practical use in analyzing sequences with high error rates while also providing insights for the hardness of designing ungapped LSH functions.

     
    more » « less
  2. Abstract Motivation

    Modern methods for computation-intensive tasks in sequence analysis (e.g. read mapping, sequence alignment, genome assembly, etc.) often first transform each sequence into a list of short, regular-length seeds so that compact data structures and efficient algorithms can be employed to handle the ever-growing large-scale data. Seeding methods using kmers (substrings of length k) have gained tremendous success in processing sequencing data with low mutation/error rates. However, they are much less effective for sequencing data with high error rates as kmers cannot tolerate errors.

    Results

    We propose SubseqHash, a strategy that uses subsequences, rather than substrings, as seeds. Formally, SubseqHash maps a string of length n to its smallest subsequence of length k, k < n, according to a given order overall length-k strings. Finding the smallest subsequence of a string by enumeration is impractical as the number of subsequences grows exponentially. To overcome this barrier, we propose a novel algorithmic framework that consists of a specifically designed order (termed ABC order) and an algorithm that computes the minimized subsequence under an ABC order in polynomial time. We first show that the ABC order exhibits the desired property and the probability of hash collision using the ABC order is close to the Jaccard index. We then show that SubseqHash overwhelmingly outperforms the substring-based seeding methods in producing high-quality seed-matches for three critical applications: read mapping, sequence alignment, and overlap detection. SubseqHash presents a major algorithmic breakthrough for tackling the high error rates and we expect it to be widely adapted for long-reads analysis.

    Availability and implementation

    SubseqHash is freely available at https://github.com/Shao-Group/subseqhash.

     
    more » « less
  3. Road safety has always been a crucial priority for municipalities, as vehicle accidents claim lives every day. Recent rapid improvements in video collection and processing technologies enable traffic researchers to identify and alleviate potentially dangerous situations. This paper illustrates cutting-edge methods by which conflict hotspots can be detected in various situations and conditions. Both pedestrian–vehicle and vehicle–vehicle conflict hotspots can be discovered, and we present an original technique for including more information in the graphs with shapes. Conflict hotspot detection, volume hotspot detection, and intersection-service evaluation allow us to understand the safety and performance issues and test countermeasures comprehensively. The selection of appropriate countermeasures is demonstrated by extensive analysis and discussion of two intersections in Gainesville, Florida, USA. Just as important is the evaluation of the efficacy of countermeasures. This paper advocates for selection from a menu of countermeasures at the municipal level, with safety as the top priority. Performance is also considered, and we present a novel concept of a performance–safety trade-off at intersections. 
    more » « less
  4. Boucher, Christina ; Rahmann, Sven (Ed.)
    Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be (d₁, d₂)-sensitive if any two sequences within an edit distance of d₁ are mapped into at least one shared bucket, and any two sequences with distance at least d₂ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of (d₁,d₂) and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions. 
    more » « less
  5. Intercalated layered materials offer distinctive properties and serve as precursors for important two-dimensional (2D) materials. However, intercalation of non–van der Waals structures, which can expand the family of 2D materials, is difficult. We report a structural editing protocol for layered carbides (MAX phases) and their 2D derivatives (MXenes). Gap-opening and species-intercalating stages were respectively mediated by chemical scissors and intercalants, which created a large family of MAX phases with unconventional elements and structures, as well as MXenes with versatile terminals. The removal of terminals in MXenes with metal scissors and then the stitching of 2D carbide nanosheets with atom intercalation leads to the reconstruction of MAX phases and a family of metal-intercalated 2D carbides, both of which may drive advances in fields ranging from energy to printed electronics.

     
    more » « less
  6. As a part of road safety initiatives, surrogate road safety approaches have gained popularity due to the rapid advancement of video collection and processing technologies. This paper presents an end-to-end software pipeline for processing traffic videos and running a safety analysis based on surrogate safety measures. We developed algorithms and software to determine trajectory movement and phases that, when combined with signal timing data, enable us to perform accurate event detection and categorization in terms of the type of conflict for both pedestrian-vehicle and vehicle-vehicle interactions. Using this information, we introduce a new surrogate safety measure, “severe event,” which is quantified by multiple existing metrics such as time-to-collision (TTC) and post-encroachment time (PET) as recorded in the event, deceleration, and speed. We present an efficient multistage event filtering approach followed by a multi-attribute decision tree algorithm that prunes the extensive set of conflicting interactions to a robust set of severe events. The above pipeline was used to process traffic videos from several intersections in multiple cities to measure and compare pedestrian and vehicle safety. Detailed experimental results are presented to demonstrate the effectiveness of this pipeline. 
    more » « less
  7. A selenophene-containing conjugated organic ligand, 2-(4′-methyl-5′-(5-(3-methylthiophen-2-yl)selenophen-2-yl)-[2,2′-bithiophen]-5-yl)ethan-1-aminium (STm), was synthesized and incorporated into a Sn( ii )-based two-dimensional perovskite, (STm) 2 SnI 4 . The band offset between the perovskite and ligand can be fine-tuned by introducing the STm ligand. Both field-effect transistor and light-emitting diode devices based on (STm) 2 SnI 4 films exhibit high performance and enhanced operational stability. 
    more » « less
  8. Abstract

    Second sound refers to the phenomenon of heat propagation as temperature waves in the phonon hydrodynamic transport regime. We directly observe second sound in graphite at temperatures of over 200 K using a sub-picosecond transient grating technique. The experimentally determined dispersion relation of the thermal-wave velocity increases with decreasing grating period, consistent with first-principles-based solution of the Peierls-Boltzmann transport equation. Through simulation, we reveal this increase as a result of thermal zero sound—the thermal waves due to ballistic phonons. Our experimental findings are well explained with the interplay among three groups of phonons: ballistic, diffusive, and hydrodynamic phonons. Our ab initio calculations further predict a large isotope effect on the properties of thermal waves and the existence of second sound at room temperature in isotopically pure graphite.

     
    more » « less